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ABSTRACT 

The H I Parkes All Sky Survey (HiPASS) is a blind extragalactic H i 21-cm emission line 
survey covering the whole southern sky from declination —90° to +25°. The HiPASS 
catalogue (Hicat) , containing 4315 H i-selected galaxies from the region south of decli- 
nation +2°, is presented in Meyer et al. (2004a, Paper I). This paper describes in detail 
the completeness and reliability of HiCAT, which are calculated from the recovery rate 
of synthetic sources and follow-up observations, respectively. HiCAT is found to be 99 
per cent complete at a peak flux of 84 mJy and an integrated flux of 9.4 Jykms"^. 
The overall reliability is 95 per cent, but rises to 99 per cent for sources with peak 
fluxes > 58 mJy or integrated flux > 8.2 Jykms^^. Expressions are derived for the 
uncertainties on the most important HiCAT parameters: peak flux, integrated flux, ve- 
locity width, and recessional velocity. The errors on HiCAT parameters are dominated 
by the noise in the HiPASS data, rather than by the parametrization procedure. 

Key words: methods: observational - methods: statistical - surveys - radio lines: 
galaxies - galaxies: statistics 
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1 INTRODUCTION 

The H I Parkes All Sky Survey (HiPASS) is a blind neutral hy- 
drogen survey over the entire sky south of declination +25°. 
One of the main objectives of the survey is to extract a 
sample of Hl-selected extragalactic objects, which can be 
employed to study the local large scale structure and the 
properties of galaxies in a manner free from optical selection 
effects. In Meyer et al. (2004a, paper I, hereafter) we present 
the HiPASS sample of 4315 Hl-selected objects from the re- 
gion south of declination -1-2°. This sample, which we refer 
to as HiCAT, forms the largest catalogue of extragalactic H l- 
selected objects to date. In paper I the selection procedure 
of HiCAT is described in detail, along with a discussion of the 
global sample properties and a description of the catalogue 
parameters that have been released to the public. The sci- 
entific potential of Hicat is very large, but to make optimal 
use of the catalogue it is essential that the completeness and 
reliability are well understood and quantified. Only after an 
accurate assessment of the completeness and reliability is it 
possible to extract from the observed sample the intrinsic 
properties of the local galaxy population. 

For optically-selected galaxy samples, this procedure 
is relatively straightforward since most optically-selected 
galaxy samples are purely flux-limited, possibly complicated 
by the reduced detection efficiency of objects with low op- 
tical surface brightness (see e.g., Lin et al 1999, Strauss et 
al. 2002, Norberg et al. 2002). Since the Hi 21-cm emission 
of galaxies is localized in a narrow region of velocity space, 
blind 21-cm surveys need to cover the two spatial dimensions 
and the velocity dimension simultaneously. The advantage 
of this is that the survey yields redshifts simultaneously with 
the object detections, and follow-up redshift surveys are not 
required. However, this extra dimension complicates the de- 
tection efficiency. The 'detectability' of a 21-cm signal de- 
pends not only on the flux, but also on how this flux is 
distributed over the velocity width of the signal. 

In this paper we take an empirical approach to this 
problem, and determine the completeness of HiCAT by the 
recovery rate of synthetic sources that have been inserted in 
the data. The reliability is determined by follow-up obser- 
vations of a large number of sources. Our aim is to describe 
in detail the completeness and the reliability of HiCAT as 
a function of various catalogue parameters, in such a way 
that future users can make optimal use of HiCAT in studies 
of e.g., the Hi mass function, the local large scale structure, 
the TuUy-Fisher relation, etc. We also discuss in detail the 
errors on the HiCAT parameters, determine expressions to 
estimate errors and estimate what fraction of the error is 
determined by noise and what fraction by the parametriza- 
tion. 

The organization of this paper is as follows: in Section|5| 
a brief review of the HiPASS surveys is given. In Section|21the 
completeness of HiCAT is calculated using three independent 
methods. Section |1] details the follow-up observations and 
the evaluation of the reliability. In Section|^errors on HiCAT 
parameters are calculated. 



the galaxy finding procedure and the source parametrization 
is given in Paper I. Here we briefly summarize the HiPASS 
specifics. 

The observations were conducted in the period from 
1997 to 2000 with the Parkes^ 64-m radio telescope, using 
the 21-cm multibeam receiver (Staveley-Smith et al. 1996). 
The telescope scanned strips of 8° in declination and data 
were recorded for thirteen independent beams, each with 
two polarizations. A total of 1024 channels over a total 
bandwidth of 64 MHz were recorded, resulting in a mean 
channel separation of A.v = 13.2 kms~^ and a velocity 
resolution of 5v — 18 kms~^ after Tukey smoothing. The 
data are additionally Hanning smoothed for parameter fit- 
ting to improve signal-to-noise, giving a final velocity resolu- 
tion of 26.4 kms~^. The total velocity coverage is —1280 to 
12700 kms^^. After bandpass calibration, continuum sub- 
traction and gridding into 8° x 8° cubes, the typical root- 
mean-square (rms) noise is 13 mjybeam"^. This leads to a 
3(T column density limit of « 6 x lO^'^cm"^ per channel for 
gas filling the beam. The spatial resolution of the gridded 
data is 15^5. 

The basic absolute calibration method used for HiPASS 
is described by Barnes et al. (2001). The absolute fiux 
scale was determined during the first Hipass observations in 
February 1997 by calibrating a noise diode against the radio 
sources Hydra A and 1934-638 with known amplitudes (rel- 
ative to the Baars et al. 1977 flux scale) . The calibration was 
checked regularly (on average three time each year) by re- 
observing the two calibration sources. The r.m.s. of the flux 
measurements averaged over all 13 beams and two polariza- 
tions is 2%, which gives a good indication of the stability of 
the absolute flux calibration. 

Two automatic galaxy finding algorithms were applied 
to the HiPASS data set to identify candidate sources. To 
avoid confusion with the Milky Way Galaxy and high ve- 
locity clouds, the range ugsr < 300 kms~^ was excluded 
from the list. The resulting list of potential detections was 
subjected to a series of independent manual checks. First, to 
quickly separate radio frequency interference and bandpass 
ripples from real H I sources, two manual checks were done 
examining the full detection spectra. Detections that were 
not rejected by both checks were then examined in spectral, 
position, RA-velocity, and dec-velocity space. Finally, the 
detections were parametrized interactively using standard 
MIRIAD (Sault, Teuben,& Wright 1995) routines. This final 
catalogue of Hl-selected sources is referred to as HiCAT. 



3 COMPLETENESS 

The completeness C of a sample is defined as the fraction 
of sources from the underlying distribution that is detected 
by the survey. For an Hl-selected galaxy sample, C is de- 
pendent on the peak flux, Sp, and the velocity width, W , or 
alternatively on a combination of both. 

One way of determining the completeness is through an- 
alytical methods. For example, for the AHISS sample pre- 
sented in Zwaan et al. (1997), a 'detectability' parameter 



2 THE HIPASS SURVEY 

The observing strategy and reduction steps of HiPASS are de- 
scribed in detail in Barnes et al. (2001). A full description of 



^ The Parkes telescope is part of the Australia Telescope, which 
is funded by the Commonwealth of Australia for operation as a 
National Facility managed by CSIRO. 
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was calculated, which depended on the distance of the de- 
tection from the center of the beam, the variation of feed 
gain with frequency, the velocity width, and the integrated 
flux. The completeness was assumed to be 100 per cent if the 
detectabihty > 1, corresponding to the requirement that S-p 
be larger that 5 times the local rms noise level after optimal 
smoothing. This analytically derived detectabihty was then 
compared with the survey data and proved to be a satisfac- 
tory description of the survey completeness. Rosenberg & 
Schneider (2002) used an empirical approach to assess the 
completeness of their Arecibo Dual-Beam Survey (ADBS), 
by inserting a large number of synthetic sources through- 
out the survey data. By determining the rate at which the 
synthetic sources could be recovered, they established the 
completeness, which they expressed as a function of signal- 
to-noise. 

In this paper we also choose to assess the complete- 
ness of the HiPASS sample by inserting in the data a large 
number of synthetic sources, prior to running the automatic 
galaxy finding algorithms (see Paper I). The actual process 
of source selection is a multi-step process, which is partly au- 
tomated and partly based on by-eye verification. It is there- 
fore preferable to study the completeness empirically instead 
of analytically. 

The synthetic sources were constructed to resemble real 
sources, and were divided into three groups based on their 
spectral shapes: Gaussian, double-horned, and flat-topped. 
The sources were not spatially extended. The velocity width, 
peak flux, and position of each synthetic source were chosen 
randomly, and were drawn from a uniform parent distribu- 
tion that spans the range 20 to 650 kms~^ in W , the range 
20 to 130 mjy in Sp, and the range 300 to 10000 kms"^ in 
velocity. Care was taken not to place synthetic sources on 
top of real sources. This was done by using the results of 
an automatic galaxy finding algorithm that was run prior to 
the insertion of the synthetic sources. A total of 1200 syn- 
thetic sources were inserted in the HlPASS data cubes, with 
approximately equal numbers of each of the three profile 
types. 

In Fig.0we show a greyscale representation of the com- 
pleteness of the HiPASS sample in the Sp, W plane and in 
the Sp, Sint plane, where Sp is the peak flux density in Jy, 
W is velocity width in kms~^, and Sint is the integrated flux 
in Jykms"^. The completeness in these plots is simply de- 
termined by calculating D, the fraction of fake sources that 
is recovered in each bin: 



D[Sp, W) = Nl:T(Sp, W)/N'^''\Sp, W). 



(1) 



In order to calculate the completeness as a function of 
one parameter, we need to integrate along one of the axes, 
and apply a weighting to account for the varying number of 
sources in each bin. Put differently, the completeness C is 
the number of detected real sources A'^ divided by the total 
number of true sources in each bin, which we estimate with 
N/D. For example, the completeness as a function of Sp, 
determined from the Sp , W matrix is given by 

EZ=oN(Sp,w) 



C{Sp 



(2) 



j:;;.^^N{Sp,w)/D{Sp,w)- 

This weighting corrects for the fact that the parame- 
ter distribution of the synthetic sources might be different 
from that of the underlying real galaxy distribution. Simi- 
larly, C{W) can be determined by integrating over Sp, and 



C(S'int) can be determined by integrating over a Sp, Sint ma- 
trix. Hereafter, we refer to C as the differential completeness 
since it refers to the completeness at a certain value of Sp, 
Sint, or W. 

It is often convenient to calculate the cumulative com- 
pleteness C"^"™. For example, C"^"™(S'p) is the completeness 
for all sources with peak fluxes larger than Sp: 



C"^™(Sp) 






Xw=oNiS'p,W) 



,S' =oo 



Es' :l Ew-=o ^(^p. W)/DiSi., W) 



(3) 



3.1 Results 

Fig- HI shows the result of this analysis, the circles show the 
differential completeness as a function of Sp, W50, and Sint, 
and the triangles show the cumulative completeness. Error 
bars indicate 68 per cent confidence levels and are deter- 
mined by bootstrap re-sampling^. We fit the completeness 
as a function of Sp and Sint with error functions (erf), which 
are indicated by solid lines. The best-fitting error functions 
are given in Tabled along with the completeness limits at 
95 and 99 per cent. 

Clearly, there is not a sharp segregation between de- 
tectable and not detectable for any of the three parameters 
under examination. The completeness is a slowly varying 
function, which illustrates the complexity of the detectabil- 
ity of Hi signals. However, all curves reach the 100 per cent 
completeness level. This indicates that our source finding al- 
gorithms do not miss any high signal-to-noise sources, and 
our system of checking all potential sources for possible con- 
fusion with RFI is sufficiently conservative that it does not 
cause many false negatives. 

Although the above derived expressions are useful for 
understanding the completeness of HiCAT, they do not allow 
us to calculate completeness levels for individual sources. For 
many purposes, for example in evaluating the H I mass func- 
tion, it is convenient to know what the completeness of the 
catalogue is for a source with specific parameters. We tested 
different fitting functions and found that the completeness 
can be fitted satisfactorily using two parameters: 

C(Sp, Sint) = erf [0.036(Sp - 19)]erf [0.36(Sint - 1.1)]. (4) 

This provides an accurate fit to the completeness matrices 
shown in Fig. Q and also reproduces the one-parameter fits 
shown in Fig. |21 after the proper summation given Eq. |5| 
has been applied. In Fig. |21 the 50, 75, and 95 per cent 
completeness limits calculated using Eq. 2] are drawn on 
top of the parameter distribution of the full HiCAT data. 
The contours in the Sp,VI^-plane are calculated by assum- 
ing W = 1.22Sint/Sp, which provides a good fit to the data. 
Unfortunately, the regions of parameter space that are most 
densely populated are severely incomplete, as is generally 
true for samples that do not have a sharp completeness limit. 

^ From the parent population of A^ synthetic sources, TV sources 
are chosen randomly, with replacement. This is repeated 200 times 
and for each of these 200 re-generated samples the completeness 
C' is calculated following equations 2 and 3. The la upper and 
lower errors on the completeness are determined by measuring 
from the distribution of C" the 83.5% and 16.5% percentiles 
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Figure 1. Bivariate completeness in the Sp,W plane and the Sp,5int plane. Darker colors correspond to higher completeness. The 
contours indicate completeness levels of 50, 75, and 95 per cent (from left to right). 
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Figure 2. Completeness of HiCAT as measured from the detection rate of synthetic sources. Solid circles show the differential completeness 
and triangles the cumulative completeness. The solid lines are error function fits to the points, with the fit parameters given in Table [Tl 
Error bars indicate 68 per cent confidence levels. 



Table 1. Completeness 



parameter 


completeness C = 0.95 


C = 0.99 


Sp (mjy) 

Siut (Jykms 1) 


erf[0.028(Sp - 19)] 68 
erf[0.22(5int - 1.1)] 7.4 


84 
9.4 


5p(mJy),5int(Jykms-i) 


erf[0.036{Sp - 19)]crf [0.36(5int - 


1.1)] 



The HIP ASS Catalogue -II 5 






600 



400 



200 



1 1 1 




- 


■ \M 


_ , 


1 




ill ' 




' d 


i 


: 


9^ 


1 


- 


■w- 


-:^r..r.:.^-J 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 "r 1 






-i 10 



<J) 



0.05 



peok 



0.1 

[Jy] 



0.15 




peok 



Figure 3. Bivariate parameter distribution of HiCAT. Darker colors correspond to higher source densities. Analytical approximations 
of the completeness limits at 50, 75, and 95 per cent (from left to right) are indicated by curves. 



By cutting HiCAT at the 95 (99) per cent completeness limit, 
the sample is reduced to 2209 (1678) sources. 



3.2 Verification of completeness limits 

Among blind extragalactic H I surveys, HiPASS is unique 
in the sense that it is fully noise-limited. Surveys such as 
AHISS or the ADBS are partly bandwidth-limited, which 
means that the brightest galaxies in the sample can only be 
detected out to the distance limit set by the restricted band- 
width of the receiving system. Since HiPASS is a relatively 
shallow survey and was conducted with a large bandwidth 
(64 MHz), even the detection of the most Hi massive galax- 
ies is noise-limited. The distance distribution N{D) of HiCAT 
galaxies drops to zero at large distances, before the maxi- 
mum distance of 12, 700 kms~^ is reached (see paper I). This 
property of HiCAT enables the use of standard techniques to 
verify the completeness limits determined in Section r3.1l For 
bandwidth-limited samples these methods would not give 
meaningful results. 

Rauzy (2001) recently suggested a new tool to as- 
sess the completeness for a given apparent magnitude in a 
magnitude-redshift sample. This method is easily adapted to 
an Hl-selected galaxy sample. Essentially, the method com- 
pares the number of galaxies brighter and fainter than every 
galaxy in the sample. In the case of a homogeneously dis- 
tributed sample in space, the method is essentially the same 
as a V^/Vmax test, but by design Rauzy's method is insensi- 
tive to structure in redshift space. The method is based on 
the definition of a random variable (, which for a H l-selected 
sample can be defined as 

e(MHi) 



= 



C = 



Q[MiTiz)y 



(5) 



where O is the cumulative Hi mass function, Z is a 
'distance modulus' defined as Z = logSint — logMni, and 
Mh™(Z) is the limiting H I mass at the distance correspond- 
ing with Z. An unbiased estimate of (^ for object i is given 

by 



rii + 1' 
where Vi is the number of objects with Mni ^ 



(6) 



Mnii and 
Z ^ Zi, and Ui is the number of objects for which Afni S^ 



MB_ii"^{Zi) and Z ^ Zi. The values of (^i should be uniformly 



distributed between and 
defined as 



Tc 



Efjr'(c.-i/2) 



1. Now a quantity Tc can be 



(7) 



where Vi is the variance of (^i, defined as Vi — {rii — 
l)/[12(ni -I- 1)]. The completeness of the sample can now 
be estimated by computing Tc on truncated subsamples 
according to a decreasing ^int- For statistically complete 
subsamples the quantity Tc has an expectation value of 
zero and unity variance. The completeness limit is found 
when Tc drops systematically to negative values, where 
Tc ~ —2 (—3) indicates a 97.7 (99.4) per cent confidence 
level. In the top panel of Fig. |3]we plot the result of the Tc 
completeness test. From this we derive that the complete- 
ness limit of the sample is S}]^ = 9.5 Jy kms~^ at the 97.7 
per cent confidence level. This limit is very close to what was 
found in the previous section, where we calculated the com- 
pleteness based on the detection rate of synthetic sources. 

As a final verification we plot in the bottom panel of 
Fig. |1| the number of galaxies as a function of Sint- The 

5/2 

dotted line shows a dN oc S^^^ dSint distribution expected 
for a fiux-limited sample, and is scaled vertically so as to fit 
the right hand side of the curve. Deviations from the curve 
start to become apparent at Sint = 10 Jykms"^, which is 
consistent with the more accurate determination from the 
Tc method. Unlike the Tc method, this method of plotting 
dN as a function of Sint is sensitive to the effects of large 
scale structure. 



3.3 Completeness as a function of sky position 

HiPASS achieves 100 per cent coverage over the whole 
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Figure 4. Test of completeness limits in HiCAT. The top panel 
shows the Tq estimator (see text) as a function of integrated flux. 
The completeness limit is reached at Sint = 9-5 Jykms"^, where 
Tq = —2. The bottom panel shows the number of sources as a 
function of Sint ■ The fitted line corresponds to the expected dis- 
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dashed line. 



d5int for a flux-limited sample. The com- 
9.5 Jykms~^ is indicated by a vertical 



southern sky and has a mostly uniform noise level of 
IS.OmJy beam"^. However, in some regions of the sky the 
noise level is elevated due to the presence of strong radio 
continuum sources. In Fig. |K|the median noise level of ev- 
ery HiPASS cube is shown. These noise levels are determined 
robustly using the estimator a = s(7r/2)^'^, where s is the 
median absolute deviation from the median. This estima- 
tor is much less sensitive to outliers than the straight rms 
calculation, and provides an accurate estimate of the rms 
of the underlying distribution, provided that this distribu- 
tion is nearly- normal. The average cube noise level is ele- 
vated more than 10 per cent over just 14.8 per cent of the 
sky, and elevated more than 20 per cent over 6.2 per cent 
of the sky. A region of elevated noise levels can be clearly 
identified in Fig. |5] where the highest noise values go up 
to 22mJybeam~^. This region corresponds very closely to 
Galactic Plane, where the strongest radio continuum sources 
are located and where the density of continuum sources is 
highest. 

It is not straightforward to assess accurately how the 
completeness is affected by varying noise levels. Since a sig- 
nificantly different noise level is only observed over a small 
region of the sky, the number of synthetic sources in these 
regions is too small to calculate the completeness limits ac- 
curately. Furthermore, the region of highest noise levels co- 
incidently lies in the direction of the Local Void, where the 
detection rate of sources is naturally depressed. Therefore, 
the Tc method, or a simple y/Vmax method are also unre- 
liable estimators of the completeness here. In the absence 
of empirical estimators, we make the reasonable assumption 
that the detection efficiency scales linearly with the local 
noise level, which means that the completeness C(S'p) can 
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rms noise [mjy/beam] 



Figure 5. Median noise levels in the southern Hip ASS cubes. 
The south celestial pole is in the center, RA=0 is on top, and 
increases counter-clockwise. The scale bar shows the noise levels 
in mjy beam~^. The horizontal bright band corresponds to b = 0, 
where the noise level is elevated. The numbers correspond to the 
numbers of the 8° X 8° HiPASS cubes. 



be replaced with C{Sp x 13.0/cr) in regions of atypical noise 
levels. This implies that the 95 per cent completeness level, 
which is normally reached at 71 mjy, is reached at 85 mjy 
when the noise level is elevated by 20 per cent. The com- 
pleteness as a function of VK50 is probably not affected by a 
slight increase in noise level. The completeness as a function 
of Sint is adjusted similar to C(S'p). 

3.4 Completeness as a function of profile shape 

In order to test the detection efficiency of various profile 
shapes, the synthetic sources were divided into three groups: 
Gaussian, double-horned, and fiat-topped. We perform the 
completeness analysis for each of these subsamples individ- 
ually, and show the results in Fig. |S| Within the errors, 
the detection efficiency as a function of peak flux is inde- 
pendent of profile shape. However, C(W5o) and C{Sint) are 
somewhat depressed for double-horned profiles with respect 
to Gaussian and fiat-topped profiles. The reason for this is 
probably that low signal-to-noise double-horned profiles are 
easily mistaken for two noise peaks, whereas Gaussian and 
fiat-topped profiles have their fiux distributed over adjoin- 
ing channels, which together stand out from the noise more 
clearly. 

3.5 Completeness as a function of velocity 

In Fig. 6 of Paper I we show that the velocity distribution 
of the initial sample of potential HiCAT detections shows 
strong peaks at known RFI frequencies and frequencies cor- 
responding to hydrogen recombination lines. This might give 
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Figure 6. Completeness of HiCAT as a function of profile shape. Solid circles indicate Gaussian profiles, open boxes double-horned 
profiles, and crosses flat-topped profiles. The points for the Gaussian and flat-topped profiles are offset horizontally to avoid overlapping 
of points. 



rise to the concern that the completeness of HiCAT is sup- 
pressed at these frequencies. However, in Paper I we show 
that the three-dimensional signature of these contaminating 
signals is sufficiently characteristic that they can be reliably 
removed from the catalogue. The final distribution of HiCAT 
velocities shows no features that correlate with RFI or hy- 
drogen radio recombination line frequencies, indicating that 
the completeness is not significantly affected at these fre- 
quencies. Unfortunately, we are not able to further substan- 
tiate this claim since 1200 uniformly distributed synthetic 
sources provide insufficient velocity sampling to study the 
completeness as function of velocity in detail. 



4 RELIABILITY 

The reliability of the sample was determined by re-observing 
a subsample of sources with the Parkes Telescope. The aim 
of the observations was twofold: assessing the reliability of 
HiCAT as a function of peak flux, integrated flux and velocity 
width, and removing spurious detections from the catalogue. 
The subsample was chosen in such a way that the full range 
of HiCAT parameters is represented, but preference was given 
to those detections that have low integrated fluxes. For ev- 
ery observing session, a sample was created that consisted of 
randomly chosen HiCAT detections, complemented with de- 
tections with low S'int (generally lower than 8 Jykms"^). At 
the time of the observations, the observer chose randomly 
from these samples. The full range of RA was covered by 
the observations. 

The observations were carried out over five observing 
sessions between September 2001 and November 2002. They 
were done in narrow-band mode, which gives 1024 channels 
over 8 MHz, resulting in a spectral resolution of 1.65 kms~^ 
at 2: = 0. In this narrow-band correlator setting only the 
inner 7 beams of the multibeam system are available. An 
observing mode was used where the target is placed sequen- 



tially in each of the 7 beams and a composite off-source 
spectrum is calculated from the other 6 beams. This strat- 
egy yields a noise level 1.85 times lower than standard on-off 
observations in the same amount of time. Typical integra- 
tion times were 15 min. The narrow-band observations yield 
lower rms noise levels than standard broad-band multibeam 
observations. Furthermore, the high frequency resolution en- 
ables better checks of the reality of HiCAT sources since nar- 
row signals can be detected in several independent channels. 
The data were reduced using the AIPS-I--I- packages live- 
DATA and GRIDZILLA (Barnes et al. 2001), and the detections 
were parametrized using standard miriad routines. 

First, we consider the reliability of the original cata- 
logue, before unconfirmed sources have been taken out. The 
fraction of sources that was confirmed is defined as 

r(Sp, w) = Ni^itiSp, w)/KtiSp, w), (8) 

where A'^^ont s-nd A'^bi are the number of confirmed and 
observed sources, respectively. The reliability as a function 
of peak flux Sp is the mean of T, weighted by the number 
of sources in each bin: 



RiSp) = 



j:w=oN{Sp,W)xTiSp,W) 



and the cumulative reliability is 



,s' 



7?'=™(Sp) 



EsU„Ew=oN{S'p,W)xT{S'p,W) 



Et; 



Ew=oNiS'p,W) 



(9) 



(10) 



Again, analogous methods can be used to measure R{W) 
and R{Sint)- Fig.|2lshows the measured reliability as a func- 
tion of Sp, W50, and Sint. The crosses show the differential 
reliability, error bars indicate 68 per cent confidence levels 
and are determined by bootstrap re-sampling the data 200 
times. 

As sources that were re-observed but not confirmed 
were taken out of the catalogue, by re-observing a subsam- 
ple of sources we improve the catalogue reliability. Eventu- 
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ally if we were to re-observe all sources the reliability would 
rise to 100 per cent. To calculate the reliability after taking 
out unconfirmed sources, we have to estimate the expected 
number of real sources, which is the number of confirmed 
sources plus T times the number of sources that have not 
been observed. 

A final complication arises because a second subsample 
of sources from HiCAT was re-observed as part of a program 
to measure accurate velocity widths (Meyer et al. 2004b). 
This program also influences the reliability because non- 
detections were taken out of the catalogue and detections 
are marked as 'confirmed' in HiCAT. This latter class of 
sources is indicated as N^^f. Now, the expected number 
of real sources is given by 

iVoxp.oai = Kt,+NllMN~Kt,~Nf^^,)x (iv:s!,f /iv„ts).(ii) 

Note that the total number of sources in HiCAT, A^, excludes 
all unconfirmed sources. Now, we can redefine T as 



r(Sp, PF) = A^exprcai(Sp, w^)/iv(5'p, w-), 



(12) 



and equations |^ and 1101 can be used to calculate the reli- 
ability of HiCAT. The circles and triangles in Fig. Q show 
the measured differential and cumulative reliability, respec- 
tively. In total, 1201 sources were observed, of which 119 
were rejected. 



4.1 Results 

The overall reliability is very high (95 per cent), partly 
because the catalogue was cleaned up considerably by re- 
observing many sources and rejecting unconfirmed sources 
from the catalogue. The reliability drops significantly below 
Sp < 50 mjy and S'lnt < 5Jykms~^, and there is possi- 
bly a reduced reliability around WrM = 350 kms~^. This 
latter feature may be related to the confusion of real H I 
emission signals with ripples in the spectral passband. We 
fit the reliability as a function of peak fiux and integrated 
fiux with error functions, the parameters of which are pre- 
sented in Table |5| The 99 per cent reliability level is reached 
at S-p = 58 mJy and ^int =8.2 Jykms"^. If sources with 
a HiCAT comment '2=have concerns' are removed from the 
sample, the overall reliability rises to 97 per cent. Similarly 
to the results found for the completeness levels, we find that 
the reliability of individual sources can be determined sat- 
isfactorily as a function of Sp and Sint . The functional form 
is given in Table |5| 



5 PARAMETER UNCERTAINTIES 

A detailed description of all measured parameters in HiCAT 
is presented in Paper I. Here we discuss the error estimates 
of the most important parameters: peak flux, integrated 
fiux, velocity width, heliocentric recessional velocity {cz) 
and sky position. Other authors have discussed analytical 
approaches to estimating uncertainties on Hi 21-cm param- 
eters (e.g., Schneider et al. 1990, Fouque et al. 1990, Ver- 
heijen & Sancisi 2001), but for HiCAT sufficient comparison 
data are available to measure the errors empirically. In this 
analysis we make use of the synthetic source parameters and 



the narrow-band observations to determine the total obser- 
vational errors on the parameters. The data published in the 
HiPASS BGC (Koribalski et al. 2004) are used to establish 
what fraction of the error is caused by the parametrization 
procedure. 

We assume that the error ox on parameter X can be 
satisfactorily described by 
o-(X)=ciy"-hC2, (13) 

where y is a parameter that can be equal to X or any 
other parameter, and n, c\, and C2 are constants. There is 
no physical basis for this analytical description of the errors, 
but we find later that Eq. 1131 provides satisfactory fits to 
the measured parameter uncertainties. In the following we 
determine how each o{X) depends on all parameters. 

When comparing parameters from different data sets, 
we know that the measured rms scatter on the difference 
between HiCAT parameter X and parameter X from data 
set Z is given by 



^{X)l 



'^(^)hicat + ^{X)z, 



(14) 



where (j(X)moas is the measured rms scatter on Xhicat — 
Xz, o-(X)hicat is the error in the HiCAT parameter, and 
(j[X)z is the error in data set Z. The latter two parameters 
are unknown, but we can make the simplifying assumption 
that 



o{X)z = o-(X)hicat- 



rmsz 



(15) 



rmsHicAT 

where rmsz denotes the rms noise in the survey on which 
catalogue Z is based. 



5.1 Error estimates from comparison with 
synthetic sources 

First, we compare the HiCAT parameters with those of the 
synthetic sources. This comparison is particularly useful 
for estimating errors because the synthetic source param- 
eters are noise free, which means that a(X)fakc = for all 
parameters. Therefore, the measured cr(X)moas is equal to 
cr(X)HiCAT, which is the parameter of interest. In Fig.|S|we 
plot the difference between the measured Hicat parameters 
and parameters of the synthetic sources that were inserted 
into the data. The left panels show the difference histograms, 
fitted by Gaussian profiles. Parameters for these are indi- 
cated in the top left corners. The right panels show the dif- 
ferences as a function of the measured HiCAT parameters. 
The points and error bars show the zero-points and widths 
of the fitted Gaussians (inner and outer error bars indicate 
la and Sct, respectively) in different bins. We prefer Gaus- 
sian fitting to calculating straight rms values because this 
latter estimator is much more sensitive to outliers. In the 
right panels we indicate the best-fitting relations for 3o-(X) 
by dashed lines. 

As is expected, the error on Sp is independent of peak 
fiux. The measurement error is just determined by the 
13.0 mJy rms noise in the spectra, but lowered to a mea- 
sured value of 11.0 mJy due to the fact that more than one 
channel may contribute to the measurement of 5p. The er- 
ror on Sp is not found to be dependent on any of the other 
parameters, so we adopt a fixed value of cr(S'p) = 11.0 mJy. 
The other effect than can be seen in this panel is that there 
is a global offset of 5.0 mJy in the measured Sp with respect 



Table 2. Reliability 
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parameter 


reliability C = 0.95 C = 0.99 


5p (mjy) 

5int (Jykms ^) 


erf[0.040{5'p - 12)] 50 58 
erf[0.12(5int + 6.4)] 5.0 8.2 


5p(mJy),5int(Jykms-i) 


erf [0.045{Sp - 12)]crf [0.20(5int + 6.4)] 
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Figure 7. Reliability of HlOAT measured from Parkes follow-up observations. Solid circles show the differential reliability, triangles the 
cumulative reliability. The crosses show the reliability of HiCAT before unconfirmed sources have been taken out. The solid lines are error 
function fits to the points, with the fit parameters given in Table |^ Error bars indicate 68 per cent confidence levels. 



to the peak flux of the synthetic sources. This effect, which 
in Zwaan et al. (2003) is referred to as the 'selection bias', 
arises because after adding noise to a spectrum the mea- 
sured peak flux density is generally an overestimation of the 
true peak flux density. 

The error in Sint is found to be dependent on Sint only, 
and can be satisfactorily fitted with c\ = 0.5, n = 1/2. This 
implies that cr(S'int) ~ 1.5 Jykms^^ (or 16 per cent) at the 
99 per cent completeness limit of 9.4 Jykms"^. Fouque et 
al. (1990) derive that a{Smt) is dependent on both S^t and 
Sp as o-(S'int) ex. Sint^'^Sp"^". Our analysis shows that for 
the HiCAT data cr(5'int) can be described satisfactorily as a 
function of 5int only. The error in Sint will be the dominant 
factor in the error on the Hi mass, except for the nearest 
galaxies for which peculiar velocities contribute significantly 
to the uncertainty in H I mass. 

The error in W50 is not clearly dependent on any other 
parameter, so we adopt a constant (t(W5o) = 7.5kms~^. 
It should be noted, however, that there appears to be an 
excess of points which is not satisfactorily fitted with a single 
Gaussian. These outliers are preferentially those with low 
peak fluxes, but large velocity widths. Larger uncertainties 
in the measurements of velocity width occur with broad, low 
signal-to-noise profiles, because the edges of the profiles can 
not always be chosen unambiguously. Approximately one- 
third of the measurements can be fitted with a Gaussian 
with a = 25kms~^. 



We find that the error on recessional velocity is depen- 
dent on Sp only, with higher peak flux detections having 
lower errors on the measured Viioi- The error bars can be 
fitted with parameters ci — 1.0 x 10*, n = —2 and C2 = 5. 
Fouque et al. (1990) find for their data that n = — 1, but in- 
corporate an additional dependence on the steepness of the 
Hi profile. 

In Fig. |n| the top panel shows the difference between 
position of the inserted synthetic sources and the fitted po- 
sition after parametrization. The lower two panels show the 
position differences as a function of Sint. The positional ac- 
curacy in RA is fitted with ci = 5.5, n = —1, C2 = 0.45, and 
the accuracy in Dec is fitted with ci = 4, n = —1, C2 = 0.4. 
These numbers imply that the positional accuracy at the 99 
per cent completeness limit is l!05 in RA, and 0.'82 in Dec. 
The difference between these two numbers arises because the 
HiPASS data are more regularly sampled in the Dec direc- 
tion (see Barnes et al. 2001). This positional accuracy agrees 
very well with the results found from HiCAT matching with 
the 2MASS Extended Source Catalogue (Jarret et al. 2003, 
see Meyer et al. 2004b). 



5.2 Verification of error calculations with Parkes 
follow-up observations 



the 



Although the comparison with noise-free parameters in u^^. 
previous subsection is a useful method of estimating the er 
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Figure 8. Comparison between HiCAT and synthetic source pa- 
rameters. The left panels show the histograms of the differences, 
which are fitted by Gaussian profiles for which the parameters 
are indicated in the top left corners. The right panels show the 
differences as a function of the measured HiCAT parameters. The 
points and error bars show the zero-points and the widths of the 
fitted Gaussians (inner and outer error bars indicate 1<T and Scr, 
respectively). The dashed lines are the best-fitting analytical de- 
scriptions of 3cr{X). 



Table 3. Parameter uncertainties 



parameter 


error 


cr at C = 0.99 


'^(■Sp) 


If.OmJy 


ll.OmJy 


-^CSint) 


0.5S;Yt^Jykms-i 


1.5 Jy kms"'^ 


<riW50) 


7.5kms-l 


7.5kms-i 


O-(Vhcl) 


1.0 X 10*Sp^-(-5kms-i 


6.4kms-i 


o-(RA) 


5. 55."^ -1-0.45 arcmin 
45rj"-|- 0.4 arcmin 


1.05 arcmin 


cr (dec) 


0.82 arcmin 



rors on HiCAT parameters, it is important to verify these 
results with independent measurements. Such measurement 
are available through our program of Parkes narrow-band 
(NB) follow-up observations, which was described in Sec- 
tion 01 These follow-up observations are preferentially tar- 
geted at sources with low integrated fluxes, but the sample 
is sufficiently large to make a meaningful parameter com- 
parison over a large dynamic range. The NB observations 
were carried out independently from the HiPASS program 
and consisted of pointed observations instead of the active 
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Figure 9. The top panel shows the difference between the posi- 
tion of the inserted synthetic sources and the fitted position after 
parametrization, in arcmin. The lower panels are similar to those 
in Fig. m 



scanning used for HiPASS. The spectral resolution of the NB 
observations was 1.65 kms~^, compared to 13.2 kms~^ for 
HiPASS, but the data used in this section were smoothed to 
the HiPASS resolution. The NB profiles were parametrized 
with the same miriad software used for HiCAT. 

In Fig. 1101 the differences between HiCAT parameters 
and those from follow-up observations is presented. The 
dashed lines in the right-hand panels are not fits to the error 
bars, but are the equations given in Tabled converted using 
Eg. 1141 and Eg. 1151 Here we have adopted rmsNB ~ 7mJy, 
which is the mean rms noise in the follow-up spectra after 
smoothing these to the HiPASS resolution. The converted 
error estimates provide good fits to the measured scatter, 
indicating that the equations in Table can be used to find 
reliable errors on HiCAT parameters. We note that the errors 
on the peak and integrated flux include the uncertainties in 
the calibration of the flux scale, except for errors in the Baars 
et al. (1977) flux scale. 



5.3 Parameter comparison with Bright Galaxy 
Catalogue 

The Bright Galaxy Catalogue (BGC, Koribalski et al. 2004) 
consists of the 1000 HiPASS galaxies with the highest peak 
fiuxes and is assembled and parametrized independently 
from HiCAT, but is extracted from the same data cubes. 
By comparing the HiCAT parameters with those from the 
BGC, it can be determined what fraction of the error on the 
HiCAT parameters is determined by the parametrization pro- 
cedure (internal error) , and what fraction is caused by noise 
in the HiPASS data (external error). This comparison is par- 
ticularly interesting because generally in the parametriza- 
tion of a 21-cm emission line profile a number of choices 
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Figure 10. Similar to Fig. 1^1 but showing the comparison be- 
tween parameters from HiCAT and narrow-band follow-up obser- 
vations. 
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Figure 11. Similar to Fig. 1^1 but showing the comparison be- 
tween parameters from HiCAT and the HiPASS Bright Galaxy Cat- 
alogue (Koribalski et al. 2004). 



are made, which could differ between the persons doing the 
parametrization. The biggest uncertainty is probably the 
fitting and subtracting of the spectral baseline. Structure 
in the baseline is caused by ringing associated with strong 
Galactic H I emission and continuum emission that can pro- 
duce standing wave patterns in the telescope structure. For 
the BGC, the spectral baselines were fitted with polynomi- 
als, of which the order is a free parameter, whereas HiCAT 
baselines were fitted with Gaussian smoothing, where the 
dispersion is a free parameter (see paper I) . Another uncer- 
tainty is introduced with the choice of the velocity extrema 
of line profiles, between which the fiux is integrated. 

In Fig. 111! the comparison with the BGC is shown. 
The difference between HiCAT and BGC parameters is very 
small. There are no systematic trends, except for a slight ex- 
cess of points with high values of Sint (HICAT) - S'int(BGC) 
at large values of Sint- This excess arises because HiCAT and 
the BGC use different criteria to define what is an extended 
source. This leads to more sources in HiCAT being fitted as 
extended, which generally results in higher values of Sint. 
Overall, we find that the parametrization error contributes 
only marginally to the total error, with a contribution of 8 
per cent to a{Sp), 13 per cent to a{Sint), and 1 per cent 
to a{W5o). The contribution to a{Vhci) is not uniquely de- 
fined because it depends on Sp, but on average it is 13 per 
cent. The rms scatter on the difference between the BGC 
and HiCAT values of Vhci is only 4.8 kms~^ at the 99 per 



cent completeness limit and drops to 2 kms ^ for brighter 
sources. 



6 SUMMARY 

The full catalogue of extragalactic HiPASS detections 
(Hicat) has now been released to the public (Meyer et al. 
2004a, paper I). In the present paper we have addressed 
in detail the completeness and reliability of the survey. We 
present analytical expressions that can be used to approxi- 
mate the completeness and reliability. We find that HiCAT 
is 99 per cent complete at a peak fiux of 84 mjy and an 
integrated fiux of 9.4 Jykms"'^. The overall reliability is 
95 per cent, but rises to 99 per cent for sources with peak 
fiuxes > 58 mJy or integrated flux > 8.2 Jykms"^. Expres- 
sions are derived for the uncertainties on the most impor- 
tant HiCAT parameters: peak flux, integrated flux, velocity 
width, and recessional velocity. The errors on HiCAT param- 
eters are dominated by the noise in the HiPASS data, rather 
than by the parametrization procedure. 
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